Skip to content

[fix][broker] Key_Shared subscription doesn't always deliver messages from the replay queue after a consumer disconnects and leaves a backlog#24732

Merged
lhotari merged 2 commits intoapache:branch-4.0from
nborisov:fix-key-shared-bug-23845
Sep 12, 2025
Merged

Conversation

@nborisov
Copy link
Contributor

@nborisov nborisov commented Sep 11, 2025

UPDATE: superseded by #24736 to master branch

Fixes #23845

Motivation

There is a not covered scenario for draining hashes and key shared subscriptions at PersistentStickyKeyDispatcherMultipleConsumers. The detailed scenario described at #23845 (comment)
As a result draining hashes could contain entries with consumer which was stopped. This leads consuming to get stuck.

Modifications

Decrease draining hash entry refCount in case its hash range returned to the initial owner which holds the entry in pending acks.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added integration test org.apache.pulsar.client.api.KeySharedSubscriptionTest#testMessageDeliveredFromDrainingHashes to verify the scenario
  • Existing unit tests were modified to check changes applied in PR: org.apache.pulsar.broker.service.ConsistentHashingStickyKeyConsumerSelectorTest#testShouldNotSwapExistingConsumers, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerRemovedHashRanges_NoChanges, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerUpdatedHashRanges_RangeAdded, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerRemovedHashRanges_RangeUpdated, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerUpdatedHashRanges_OverlappingRanges

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: nborisov#4

…deliver messages from the replay queue after a consumer disconnects and leaves a backlog unless new messages are produced
@nborisov
Copy link
Contributor Author

nborisov commented Sep 11, 2025

@lhotari FYI: all the checks passed in forked repository nborisov#4

@lhotari lhotari self-requested a review September 11, 2025 19:29
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job @nborisov! One minor detail to address.

@nborisov
Copy link
Contributor Author

@lhotari Thank you for the review and quick responses!

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎯 perfect job, @nborisov

@lhotari lhotari changed the title [fix][broker] Key_Shared or Shared subscription doesn't always deliver messages from the replay queue after a consumer disconnects and leaves a backlog unless new messages are produced [fix][broker] Key_Shared subscription doesn't always deliver messages from the replay queue after a consumer disconnects and leaves a backlog Sep 12, 2025
@lhotari lhotari merged commit ca8d338 into apache:branch-4.0 Sep 12, 2025
54 checks passed
@lhotari
Copy link
Member

lhotari commented Sep 12, 2025

btw. Didn't realize until now that this PR got merged into branch-4.0. @nborisov Please open a new PR with the same content to master branch since in Pulsar, the process goes from the master branch and committers handle cherry-picking and backporting and ask for backports when needed. I'll revert this from branch-4.0 and let's start from master branch.

@lhotari
Copy link
Member

lhotari commented Sep 12, 2025

@nborisov You will need to fork a new branch name from the PR branch (since this PR is now merged) and then rebase it on origin/master to get the PR commits there. I'd assume that the changes apply cleanly in rebasing. I'm sorry for the hassle, I didn't notice the target branch in merging. Before merging, it would have been possible to update the target branch.

@nborisov
Copy link
Contributor Author

@lhotari Sorry for misunderstanding the process! Preparing a new PR based on master branch.

@nborisov
Copy link
Contributor Author

@lhotari Created the new PR #24736. Please let me know if I it following the process or I need to make some changes.

ganesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (apache#24732)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
(cherry picked from commit ca8d338)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (apache#24732)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
(cherry picked from commit ca8d338)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants